generative environment model
Author Response for ' Shaping Belief States with Generative Environment Models for RL '
We are grateful to all constructive and actionable feedback provided by the reviewers. We believe to have addressed the key concerns raised by the reviewers below. 's concerns with our main hypothesis as it has not We are working to improve our explanations in section 2.2 based on all feedback We emphasize that careful empirical experimentation in ML can also bring valuable insights to the community. Studying these factors require an intersectional empirical study such as this paper. Probabilistic models benefit more from overshoot than Deterministic models.
Reviews: Shaping Belief States with Generative Environment Models for RL
Post rebuttal update: I appreciate the additional explanation for need of overshooting in empirical methods, and the clarity of response regarding stochastic models. The issue I took was with Sec 2.2, that next-step prediction is insufficient to produce belief states, which is only an issue with approximation error when dealing with empirical results. This is not clearly explained in the paper, but clarified much more nicely in the rebuttal. This would cause me to raise my score from a 3 to a 4 for the misunderstanding, but I still do not find this paper worthy of acceptance. I don't think they are particularly surprising insights, and it seems the sole merit of this paper is an empirical one, and impressive because of performance on complex tasks.
Reviews: Shaping Belief States with Generative Environment Models for RL
This paper examines the use of generative models for developing representations to improve data efficiency in RL. Specifically, the authors use a generative model that is trained to predict multiple frames into the future (overshooting), and they show that when the model is stochastic (but not deterministic), overshooting leads to useful representations of the environment that can improve RL efficiency. The reviews on this paper were fairly divergent in the first round. Two of the reviewers liked this paper, but one did not feel it provided truly novel contributions, and only brought together previously proposed ideas for using predictive training to improve RL representations. In discussion, the reviewers came to the conclusion that it does demonstrate the utility of overshoot prediction for stochastic models and that an empirical demonstration like this can be useful.
Shaping Belief States with Generative Environment Models for RL
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents.
Action-Sufficient State Representation Learning for Control with Structural Constraints
Huang, Biwei, Lu, Chaochao, Leqi, Liu, Hernández-Lobato, José Miguel, Glymour, Clark, Schölkopf, Bernhard, Zhang, Kun
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed \textit{Action-Sufficient state Representations} (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Shaping Belief States with Generative Environment Models for RL
Gregor, Karol, Rezende, Danilo Jimenez, Besse, Frederic, Wu, Yan, Merzic, Hamza, Oord, Aaron van den
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents.